Setup

Planning

Timing:

5 hours (excluding 2x30 min coffee breaks)

  • 4 blocks of 1 hour & 15 mins

  • Typically: 15 minutes talking and 1 hour of hands-on

Part A:

20-30 min intro (me, data, dataset(s), aims, phyloseq & microViz) -> bar plots & alpha diversity

Part B:

Beta diversity: distance/dissimilarity and taxon filtering (prev. & abund.) -> Ordination plots

Part C:

More Ordination (PERMANOVA) and taxa transformation (clr and log, for PCA and taxon stats)

Plots and statistics with individual taxa - (simple) “differential abundance testing”

Part D:

DADA2 denoising? Adapt the 2020 tutorial. Do I have time for this?

End workshop with recap and notes on current trends topics

SETUP

library(seriation)
library(dplyr)
library(ggplot2)
library(phyloseq)
library(microViz)
library(shiny)
# tiny bit of cleanup (ibd_phylo Species column was blank)
ibd <- corncob::ibd_phylo %>% tax_mutate(Species = NULL)
load(here::here('data/shao19.rda'))
shao19 <- tax_fix(shao19) %>% tax_names2rank("species")
## Row named: Candidatus Gastranaerophilales bacterium
## contains no non-unknown values, returning:
## 'Candidatus Gastranaerophilales bacterium' for all replaced levels.
## Consider editing this tax_table entry manually.

We’ll use two datasets today, both are about the human gut microbiome (cos that’s my field). The first is from an “old” study (2012), and the second is more recent (2019).

Each dataset was generated using different sequencing techniques, and different sequence data processing approaches. But the analyses we will learn today can be readily applied to both methods.

  • ibd is 16S gene amplicon sequencing data from a 2012 study of Inflammatory Bowel Disease in children and young adults.

  • It’s the old data: they used 454 Pyrosequencing, and clustered the raw sequences into “OTUs”.

  • shao19 is shotgun metagenomic data, from a 2019 study of infants born vaginally or by C-section.

  • They used Illumina HiSeq and the abundance of species-like features was inferred with metaphlan3.

So we’re starting in the middle. The data has already been processed from sequencing data (fastq files) into counts. But this is the exciting bit, you get to explore and visualize the data, and maybe do statistics (yay!).

Intro to phyloseq

This is a phyloseq S4 object, containing processed microbiota data from the inflammatory bowel disease study.

Get a little familiar with the object. What does it have in it? Can you look at each part?

The printed object shows you functions you can use to access the data inside.

You can also use the @ symbol.

ibd
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 36349 taxa and 91 samples ]
## sample_data() Sample Data:       [ 91 samples by 15 sample variables ]
## tax_table()   Taxonomy Table:    [ 36349 taxa by 6 taxonomic ranks ]
# View(ibd)
tax_table(ibd) %>% head()
## Taxonomy Table:     [6 taxa by 6 taxonomic ranks]:
##       Kingdom    Phylum          Class         Order          
## OTU.1 "Bacteria" "Firmicutes"    "Clostridia"  "Clostridiales"
## OTU.2 "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales"
## OTU.3 "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales"
## OTU.4 "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales"
## OTU.5 "Bacteria" "Firmicutes"    "Clostridia"  "Clostridiales"
## OTU.6 "Bacteria" "Firmicutes"    "Clostridia"  "Clostridiales"
##       Family            Genus             
## OTU.1 "Ruminococcaceae" "Faecalibacterium"
## OTU.2 "Bacteroidaceae"  "Bacteroides"     
## OTU.3 "Bacteroidaceae"  "Bacteroides"     
## OTU.4 "Prevotellaceae"  "Prevotella"      
## OTU.5 "Ruminococcaceae" "Faecalibacterium"
## OTU.6 "Lachnospiraceae" "Roseburia"
rank_names(ibd)
## [1] "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"
otu_table(ibd)[1:15, 1:10] 
## OTU Table:          [15 taxa and 10 samples]
##                      taxa are rows
##        099A 199A 062B 194A 166A 219A 132A 026A 102A 140A
## OTU.1     0    0    0    0    0    0    0    0    0    0
## OTU.2     0    0    0    0    0    0    0    0    0    0
## OTU.3     0    0    0    0    0    0    0    0    0    0
## OTU.4     0    0    0    0    0    0    0    0    0    0
## OTU.5     0    0    0    0    0    0    0    0    0    0
## OTU.6     0    0    0    0    0    0    2    0    0    0
## OTU.7     0    0    0    0    0    0    0    2    0    0
## OTU.8     0    0    0    0    0    0    0    0    0    0
## OTU.9     0    0    0    0    0    0    0    0    0    0
## OTU.10    0    0    0    0    0    0    0    0    0    0
## OTU.11    0    0    0    0    0    0    0    2    0    0
## OTU.12    0    0    0    0    0    0    0    0    0    0
## OTU.13    0    0    0    0    0    0    0    0    0    0
## OTU.14    0    0    0    0    0    0    0    0    2    0
## OTU.15    0    0    0    0    0    0    2    0    0    0
# ibd@otu_table[1:15, 1:10] # the same result
sample_variables(ibd)
##  [1] "sample"       "gender"       "age"          "DiseaseState" "steroids"    
##  [6] "imsp"         "abx"          "mesalamine"   "ibd"          "activity"    
## [11] "active"       "race"         "fhx"          "imspLEVEL"    "SampleType"
sample_data(ibd)[1:15, 1:5]
##      sample gender age DiseaseState   steroids
## 099A 099-AX female  17           UC   steroids
## 199A 199-AX female   5       nonIBD nosteroids
## 062B 062-BZ   male  16           CD   steroids
## 194A 194-AZ   male  15           UC   steroids
## 166A 166-AX female  20           UC nosteroids
## 219A 219-AX female  18           UC nosteroids
## 132A 132-AX female  12           CD   steroids
## 026A 026-AX   male  10           UC   steroids
## 102A 102-AZ   male   3       nonIBD nosteroids
## 140A 140-AX female   5       nonIBD nosteroids
## 005A 005-AX   male  17       nonIBD nosteroids
## 080A 080-AX   male  14           CD   steroids
## 103A 103-AX   male  14           CD nosteroids
## 234A 234-AX female  17       nonIBD nosteroids
## 222A 222-AZ   male  11           UC   steroids
sample_names(ibd) %>% head(10) 
##  [1] "099A" "199A" "062B" "194A" "166A" "219A" "132A" "026A" "102A" "140A"

Looking at microbiome data

Okay, so how do we look at the microbiota abundance data? To do this, we’re going to use the R package microViz

Lets take a very small subset of this data to get started, just the 10 male controls.

# We can filter the samples like this, using the sample_data information
ibd %>% ps_filter(DiseaseState == "nonIBD", gender == 'male')
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 10986 taxa and 10 samples ]
## sample_data() Sample Data:       [ 10 samples by 15 sample variables ]
## tax_table()   Taxonomy Table:    [ 10986 taxa by 6 taxonomic ranks ]
ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  comp_barplot(
    tax_level = 'unique', n_taxa = 12, bar_width = 0.7,
    sample_order = 'asis', tax_transform_for_plot = 'identity'
  )

Rats. What is this junk? The unique taxa have uninformative IDs, and we also got a message about problems with the taxonomy table.

The total number of reads also varies a lot between samples! The total number of reads for each sample is NOT a reliable indicator of the biomass or bacterial load of each sample. So for now we will just consider the relative abundance of each taxon, as proportions of the total counts for that sample.

Let’s look at the taxonomy table first.

# tax_fix_interactive(ibd) # run this in the R Console for an interactive look

Looks like we just need to fill in some blank cells when a sequence was not classified at genus or family. tax_fix can do this, it just copies down info from a higher rank classification. Let’s update our ibd phyloseq object with this fix.

ibd <- tax_fix(ibd)

We can also rename the unique taxa with a more informative name, according to their classification at Genus level, and how common they are.

ibd %>% taxa_names() %>% head
## [1] "OTU.1" "OTU.2" "OTU.3" "OTU.4" "OTU.5" "OTU.6"
ibd <- tax_rename(ibd, rank = 'Genus')
ibd %>% taxa_names() %>% head
## [1] "Faecalibacterium 0561" "Bacteroides 2425"      "Bacteroides 2426"     
## [4] "Prevotella 0190"       "Faecalibacterium 0562" "Roseburia 0259"
ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  comp_barplot(
    tax_level = 'unique', n_taxa = 12, sample_order = 'asis', bar_width = 0.7
    )

Let’s try again with the better names.

ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  comp_barplot(
    tax_level = 'unique', n_taxa = 12, sample_order = 'asis', bar_width = 0.7
    )

Sadly we don’t have enough distinct colours to show all the unique taxa. So let’s “aggregate” all the counts into genus-level groups.

ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, bar_width = 0.7,
    sample_order = 'asis', merge_other = FALSE
  )

We still quickly run out of distinct colours to display all the different genera in a complex ecosystem, but we can still get an idea of which genera are the most abundant, and how variable the communities are, even within these healthy control males. By aggregating at genus level, we have sacrificed a little bit of taxonomic resolution.

Try making some similar plots aggregated at higher taxonomic ranks. Change the tax_level argument.

# ibd %>% 
#   ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
#   comp_barplot(tax_level = , n_taxa = 12, sample_order = 'asis', merge_other = FALSE)
ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  comp_barplot(
    tax_level = "Family", n_taxa = 12, bar_width = 0.7,
    sample_order = 'asis', merge_other = FALSE
  )

ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  comp_barplot(
    tax_level = "Phylum", n_taxa = 12, bar_width = 0.7,
    sample_order = 'asis', merge_other = FALSE
  )

A note on phylum names! There have been major changes this year and some of these are now old names. Most published research is of course with the old names (and still probably will be for a year or so).

INSERT NAME CONVERSION TABLE.

Alpha diversity

How diverse is the bacterial microbiome of each sample?

Why is this interesting?

Biologically

  • Lower gut microbiome diversity is related to worse health in adult humans.
  • Higher diversity ecosystems are often considered healthier, more mature, and more resilient to perturbation.
  • BUT: diverse == healthy does not hold for all ecosystems, e.g. early infant gut microbiome, so consider your own data and hypotheses carefully.

Practically

Simple one number summary of ecosystem, easy to compare and do stats with.

Richness

The more the merrier. The simplest measure is just counting, aka. Observed Richness. Let’s compute the observed richness and label each sample with it.

ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  ps_calc_richness(rank = 'Genus', index = 'observed', varname = 'N genera') %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, label = 'N genera', bar_width = 0.7,
    sample_order = 'asis', merge_other = FALSE#, tax_transform_for_plot = 'identity'
  )

Diversity

Richness and evenness matter. Shannon index, TODO: describe Shannon index and exp shannon.

ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  ps_calc_diversity(rank = 'Genus', index = 'shannon') %>%
  ps_mutate(shannon_Genus = round(shannon_Genus, digits = 2)) %>% 
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, label = 'shannon_Genus', bar_width = 0.7,
    sample_order = 'asis', merge_other = FALSE
  )

Statistics with alpha diversity

So we have our alpha diversity values for the small set of control samples.

ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  ps_calc_diversity(rank = 'Genus', index = 'shannon') %>%
  samdat_tbl() %>% 
  ggplot(aes(x = shannon_Genus, y = "Male\ncontrols")) + 
  geom_point(position = position_jitter(height = 0.2), alpha = 0.5) +
  labs(x = 'Shannon diversity (Genus)', y = NULL) +
  xlim(0.5, 3) +
  theme_bw()

Let’s calculate alpha diversity for all the samples, and make a comparison. I suspect that the average gut microbiota diversity of the healthy controls will differ from the IBD patients.

ibd %>% 
  ps_calc_diversity(rank = 'Genus', index = 'shannon') %>%
  samdat_tbl() %>% 
  ggplot(aes(y = ibd, x = shannon_Genus)) +
  geom_boxplot(width = 0.2) + 
  geom_point(position = position_jitter(height = 0.2), alpha = 0.5) +
  theme_bw()

It looks like the ibd patients have lower gut microbiota diversity on average. A simple statistical test supports this.

ibd %>% 
  ps_calc_diversity(rank = 'Genus', index = 'shannon') %>%
  samdat_tbl() %>% 
  wilcox.test(formula = shannon_Genus ~ ibd, data = .)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  shannon_Genus by ibd
## W = 457, p-value = 0.001804
## alternative hypothesis: true location shift is not equal to 0

You can apply more complex statistical tests as you like, e.g. adjusting for covariates with linear regression, using lm()

ibd %>% 
  ps_calc_diversity(rank = 'Genus', index = 'shannon') %>%
  samdat_tbl() %>% 
  lm(formula = shannon_Genus ~ ibd + age, data = .) %>% 
  summary()
## 
## Call:
## lm(formula = shannon_Genus ~ ibd + age, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.21597 -0.35101  0.07726  0.28235  0.89140 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.04788    0.17574  11.653   <2e-16 ***
## ibdnonibd    0.27034    0.12484   2.165   0.0331 *  
## age         -0.01555    0.01197  -1.299   0.1973    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4656 on 88 degrees of freedom
## Multiple R-squared:  0.1147, Adjusted R-squared:  0.09454 
## F-statistic: 5.699 on 2 and 88 DF,  p-value: 0.004707

Try out for yourself, how does alpha diversity relate to other participant characteristics?

Practice making plots and doing simple statistical tests. What about richness?

Notes on richness and readcount

TODO: move this note to end of diversity section as an extension. TMI, derails main learning points.

Simple approaches like Observed Richness are sensitive to what ecologists call “sampling effort”. For macroecologists, this is actually how much time/effort you spent trying to count all the organisms present in an ecosystem. In our case, the amount of total reads obtained represents the sampling effort: more reads, more effort. Indeed we can see that the samples with a much lower readcount have lower observed richness.

(Furthermore, as this richness estimate is based on a sample, and not the actual ecosystem, the richness estimate actually has quantifiable uncertainty too.)

ibd %>% 
  ps_filter(DiseaseState == "nonIBD", gender == 'male') %>% 
  ps_calc_richness(rank = 'Genus', index = 'observed', varname = 'N genera') %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 12, label = 'N genera', bar_width = 0.7,
    sample_order = 'asis', merge_other = FALSE, tax_transform_for_plot = 'identity'
  )

ibd %>% 
  ps_calc_richness(rank = 'Genus', index = 'observed', varname = 'genera') %>%
  ps_mutate(readcount = sample_sums(.env$ibd)) %>% 
  samdat_tbl() %>% 
  ggplot(aes(readcount, genera)) + 
  geom_point(alpha = 0.4, size = 2.5) +
  theme_bw(14)

What to do:

  1. Simple solution: Ignore the problem. Whilst you can’t interpret the richness of any individual sample as being correct, it is still usually valid to compare richness across groups of samples, as the readcount variation is only random noise, and should be uncorrelated with your grouping variable (but do check this).
  2. Harder solution: Explore more rigorous methods like breakaway by Amy Willis and team. TODO: INSERT LINK.

COFFEE BREAK 1 (30 min)


Beta diversity (part 1)

TODO - topic intro slides

So far, this has been relatively straightforward, at least conceptually. But there is a lot more interesting stuff we can do with microbiome data.

We’ve looked at one sample at a time and calculated and compared simple summary measures of sample alpha-diversity.

Alpha diversity is sometimes referred to as “within sample” diversity.

Now we’re going to look at “beta diversity”, or “between sample” diversity.

For this part we’re going to swap to another dataset. So you get a little bit more practice examining a phyloseq object. Look at the rank names, sample data variables etc.

shao19 # this object has another part!
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 819 taxa and 1644 samples ]
## sample_data() Sample Data:       [ 1644 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 819 taxa by 6 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 819 tips and 818 internal nodes ]
#

Filtering

First, we need to talk about filtering.

Sample filtering

You should check if any of your samples have a surprisingly low total number of (classified) reads. This can suggest that something went wrong in the lab (or during sample collection) and the data from this sample might be unreliable.

You might already do this check for total reads and remove poor quality samples during the fastq file processing.

shao19 %>% 
  ps_mutate(reads = sample_sums(shao19)) %>% 
  samdat_tbl() %>% 
  ggplot(aes(x = reads)) + 
  geom_freqpoly(bins = 500) +
  geom_rug(alpha = 0.5) +
  scale_x_log10(labels = scales::label_number()) +
  labs(x = 'Number of classified reads', y = NULL) +
  theme_bw()

How many is enough? There is no easy answer.

These samples have great depth. There are a few with much less reads than the rest, and a few with under a million. You might consider dropping the samples with under a million reads, to see if it affects your results, but in this case we won’t.

But 100,000 is still a lot, compared to what older sequencing machines produced: 1000 reads might have been considered very good. So look at the distribution for your data, in case there are obvious outliers, and look at recent papers using a similar sequencing technique for what kind of threshold they used.

There might also be relevant information for the type of sequencer you used on e.g. Illumina website. e.g. for this type of sequencing Illumina suggests you should expect at least a million reads (and this is good for RNA seq analyses). https://support.illumina.com/bulletins/2017/04/considerations-for-rna-seq-read-length-and-coverage-.html

If you are interested, go back and recreate this plot with the old 454 sequencing dataset ibd.

ibd %>% 
  ps_mutate(reads = sample_sums(.env$ibd)) %>% 
  samdat_tbl() %>% 
  ggplot(aes(x = reads)) + 
  geom_freqpoly(bins = 30) +
  geom_rug(alpha = 0.5) +
  scale_x_log10(labels = scales::label_number()) +
  labs(x = 'Number of classified reads', y = NULL) +
  theme_bw()

Taxon filtering

Okay, so we might remove “bad” samples, but how can a taxon be “bad”?

We probably want to filter out rare taxa, before performing some kinds of analysis.

Why remove rare taxa?

Rare taxa might be:

  1. Sequencing errors
  2. Statistically problematic (rare )
  3. Biologically irrelevant

More elaboration for presentation/discussion

  1. Rare taxa might be erroneous: sequencing errors, chimeric sequences…

  2. Statistical challenges:

    • There are often A LOT of different rare taxa
      • Sparsity, lots of zeros
      • Noise, more measurement error in very low abundance taxa
      • Multiple comparisons (we’ll get to that later)
    • Having many many taxa in your dataset can really slow down your analysis!

    TODO: COMPLETE

  3. Biological (ir)relevance:

    • For many (most?) microbiome research questions, rare taxa are less likely to be relevant
    • For common host health problems and the human/animal microbiome, if a taxon only occurs in every 100th person, it is less likely to be relevant to risk common disease outcomes (?)

I’m sure one can think of many cases where rare taxa are biologically relevant, but problems 1 and 2 are still there.

How to remove rare taxa?

What is rare? Two main concepts.

  • Low prevalence - taxon only detected in a small number of samples in your dataset.
  • Low abundance - relatively few reads assigned to that taxon (on average or in total)

Consider the impact of issues 1, 2, and 3. Let’s say we are not interested in unique taxa that occur in fewer than 2% of samples, and they have to have at least 10,000 reads in total across all samples.

# before filtering
ntaxa(shao19)
## [1] 819
# after filtering
shao19 %>% 
  tax_filter(min_prevalence = 2 / 100, min_total_abundance = 10000) %>% 
  ntaxa()
## Proportional min_prevalence given: 0.02 --> min 33/1644 samples.
## [1] 253

Wow so that would remove most of our unique taxa! What is going on? Let’s make some plots!

# make table of summary statistics for the unique taxa in shao19
shaoTaxaStats <- tibble(
  taxon = taxa_names(shao19),
  prevalence = microbiome::prevalence(shao19),
  total_abundance = taxa_sums(shao19)
)
p <- shaoTaxaStats %>% 
  ggplot(aes(total_abundance, prevalence)) + 
  geom_point(alpha = 0.5) +
  geom_rug(alpha = 0.1) +
  scale_x_continuous(labels = scales::label_number(), name = 'Total Abundance') + 
  scale_y_continuous(
    labels = scales::label_percent(), breaks = scales::breaks_pretty(n = 9),
    name = "Prevalence (%)",
    sec.axis = sec_axis(
      trans = ~ . * nsamples(shao19), breaks = scales::breaks_pretty(n = 9),
      name = "Prevalence (N samples)"
    )
  ) +
  theme_bw()
p

So most taxa have a low prevalence, and handful have way more reads than most.

Let’s quickly label those plots to check which taxa are the big time players.

p + ggrepel::geom_text_repel(
  data = function(df) filter(df, total_abundance > 1e9 | prevalence > 0.6),
  mapping = aes(label = taxon), size = 2.5, min.segment.length = 0, force = 15
)

That makes sense for this dataset of mostly infant gut microbiome samples.

Now let’s zoom in on the less abundant taxa by log-transforming the axes. We’ll also add lines indicating the thresholds of 2% prevalence and 10000 reads abundance.

shaoTaxaStats %>% 
  ggplot(aes(total_abundance, prevalence)) + 
  geom_vline(xintercept = 10000, color = 'red', linetype = 'dotted') +
  geom_hline(yintercept = 2/100, color = 'red', linetype = 'dotted') +
  geom_point(alpha = 0.5) +
  geom_rug(alpha = 0.1) +
  scale_x_log10(labels = scales::label_number(), name = 'Total Abundance') + 
  scale_y_log10(
    labels = scales::label_percent(), breaks = scales::breaks_log(n = 9),
    name = "Prevalence (%)",
    sec.axis = sec_axis(
      trans = ~ . * nsamples(shao19), breaks = scales::breaks_log(n = 9),
      name = "Prevalence (N samples)"
    )
  ) +
  theme_bw()

We can break this down by phylum if we add the taxonomic table information.

# don't worry about this code if it's confusing, just focus on the plot output
shao19 %>% 
  tax_table() %>% 
  as.data.frame() %>% 
  as_tibble(rownames = 'taxon') %>% 
  left_join(shaoTaxaStats, by = "taxon") %>% 
  add_count(phylum, name = 'phylum_count', sort = TRUE) %>% 
  mutate(phylum = factor(phylum, levels = unique(phylum))) %>% # to fix facet order
  mutate(phylum = forcats::fct_lump_n(phylum, n = 5) %>% forcats::fct_explicit_na('Other')) %>% 
  ggplot(aes(total_abundance, prevalence)) +
  geom_vline(xintercept = 10000, color = 'red', linetype = 'dotted') +
  geom_hline(yintercept = 2/100, color = 'red', linetype = 'dotted') +
  geom_point(alpha = 0.5, size = 1) +
  geom_rug(alpha = 0.2) +
  scale_x_log10(
    labels = scales::label_log(), breaks = scales::breaks_log(n = 5),
    name = 'Total Abundance'
  ) + 
  scale_y_log10(
    labels = scales::label_percent(), breaks = scales::breaks_log(n = 9),
    name = "Prevalence (%)",
    sec.axis = sec_axis(
      trans = ~ . * nsamples(shao19), breaks = scales::breaks_log(n = 9),
      name = "Prevalence (N samples)"
    )
  ) +
  facet_wrap('phylum') + 
  theme_bw(10)

How to pick a threshold?

Depends on what analysis method you are filtering for!

  • alpha diversity = DO NOT FILTER
  • beta diversity = relevance of threshold depends on your distance measure (next topic!)
  • differential abundance testing = stringent filtering, prevalence >5%, >10%? (last topic!)

Dissimilarity measures

TODO: Slides on different dissimilarity measures.

What are we doing? Calculating the dissimilarity of two samples’ compositions.

[Visualize distance heatmaps?]

  • Binary Jaccard
  • Bray-Curtis
  • UniFrac distances (unweighted, weighted, generalised)

To simplify and speed up the analyses, we’re going to take a smaller part of the dataset. We’ll only look at the 300 infant fecal samples from 4 days of age.

shao4 <- shao19 %>% ps_filter(family_role == 'child', infant_age == 4) 

We’re going to filter out rare taxa quite strictly, for similar reasons. But we won’t overwrite our smaller dataset: we’ll do the filtering per analysis.

shao4 %>% 
  tax_filter(min_prevalence = 2.5/100) %>% 
  tax_agg(rank = 'genus') %>% 
  tax_transform('binary') %>% # converts counts to absence/presence: 0/1
  dist_calc(dist = 'jaccard') 
## Proportional min_prevalence given: 0.025 --> min 8/306 samples.
## ps_extra object - a list with phyloseq and extras:
## 
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 35 taxa and 306 samples ]
## sample_data() Sample Data:       [ 306 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 35 taxa by 5 taxonomic ranks ]
## 
## ps_extra info:
## tax_agg = genus tax_transform = binary
## 
## jaccard distance matrix of size 306 
## 0.6666667 0.7333333 0.9375 0.8125 0.6428571 ...
## 
## $counts OTU Table: [ 35 taxa and 306 samples ]

Talk about the output we see here. Then look at the distance matrix.

distances <- shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_agg(rank = 'genus') %>% 
  tax_transform('binary') %>%
  dist_calc(dist = 'jaccard') %>% 
  dist_get() 

as.matrix(distances)[1:5, 1:5]
##             B01089_ba_4 B01190_ba_4 B01194_ba_4 B01196_ba_4 B01235_ba_4
## B01089_ba_4   0.0000000   0.6666667   0.7333333   0.9375000   0.8125000
## B01190_ba_4   0.6666667   0.0000000   0.7500000   0.9166667   0.8461538
## B01194_ba_4   0.7333333   0.7500000   0.0000000   0.4615385   0.3076923
## B01196_ba_4   0.9375000   0.9166667   0.4615385   0.0000000   0.4615385
## B01235_ba_4   0.8125000   0.8461538   0.3076923   0.4615385   0.0000000
range(distances)
## [1] 0 1

Ordination

What do we do with these distances or dissimilarities? We make an ordination.

Ordination refers to the process of ordering things (in our case: samples), so that similar things (samples) are closer to each other, and dissimilar things (samples) are further away.

PCoA

Principal Co-ordinates Analysis is one kind of ordination.

Takes distance matrix and finds new dimensions (a co-ordinate system if you like)

More details this works

shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_agg(rank = 'genus') %>% 
  dist_calc(dist = 'bray') %>% 
  ord_calc(method = 'PCoA') %>% 
  ord_plot(alpha = 0.6, size = 2) +
  theme_classic(12) +
  coord_fixed(0.7)

To get a little insight into what the happened here, we can colour each sample according to its dominant (most abundant) genus.

shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  ps_calc_dominant(rank = 'genus', none = "Mixed", other = "Other") %>% 
  tax_agg(rank = 'genus') %>% 
  dist_calc(dist = 'bray') %>% 
  ord_calc(method = 'PCoA') %>% 
  ord_plot(color = "dominant_genus", alpha = 0.6, size = 2) +
  scale_color_brewer(name = "Dominant Genus", palette = 'Dark2') +
  theme_classic(12) +
  coord_fixed(0.7)

Interactive ordination!

microViz provides a Shiny app ord_explore to interactively create and explore PCoA plots and other ordinations. See the code below to get started.

Here are a few things to try:

  • Colour the samples using the variables in the sample data
  • Select a few samples to view their composition on barplots!
  • Change some ordination options:
    • Different rank of taxonomic aggregation
    • Different distances we’ve discussed
  • Copy the automatically generated code
    • Exit the app (press escape or click red button in R console!)
    • Paste and run the code to recreate the ordination plot
    • Customise the plot: change colour scheme, title, etc.
  • Launch the app again with a different subset of the data
    • Practice using ps_filter etc.
    • e.g. plot the data of the mothers’ gut microbiomes!
    • compute one or more alpha diversity measures

Beware:

  • UniFrac distances can be quite slow (over a minute) to calculate!
    • Filter to fewer samples and fewer taxa to speed it up (Before launching the app)
  • There are many distances available, feel free to try out ones we haven’t talked about
    • BUT:
      • You shouldn’t use a distance that you don’t understand in your actual work, even if the plot looks nice! ;)
      • Some of them might not work…
        • They are mostly implemented in the package vegan and I haven’t tested them all
        • Errors will appear in the RStudio R console
        • You can report to me any distances that don’t work if you’re feeling helpful!
  • There are other ordination methods available in ord_explore, which we haven’t discussed
    • We will discuss PCA and various transformations after dinner!
    • Some things we won’t have time to cover, but you can look here for info on topics like constrained ordination –> TODO: insert gusta me ecology website link
# fire up the shiny app
# run these lines in your console
shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  # calculate new sample variables with dominant taxon (optional)
  ps_calc_dominant(rank = 'genus', none = "Mixed", other = "Other") %>% 
  # launch a Shiny app in your web browser!
  ord_explore()
# different options
# run this line in your console
shao19 %>% 
  ps_filter(family_role == 'mother') %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  # calculate a few sample variables for interest (optional)
  ps_calc_dominant(rank = 'genus', none = "Mixed", other = "Other") %>% 
  ps_calc_diversity(rank = 'genus', index = 'shannon') %>% 
  ps_calc_richness(rank = 'genus', index = 'observed') %>% 
  # launch a Shiny app in your web browser!
  ord_explore()

DINNER BREAK


Beta diversity (part 2)

PERMANOVA:

Permutational multivariate analysis of variance.

  • ANOVA - analysis of variance (statistical modelling approach)
  • Multivariate - more than one dependent variable (multiple taxa!)
  • Permutational - statistical significance estimates obtained by shuffling the data many times

(Sometimes also called NP-MANOVA, non-parametric MANOVA e.g. on gusta me - insert link TODO)

TLDR: Are those groups on the PCoA actually different??

shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_agg(rank = 'genus') %>% 
  dist_calc(dist = 'bray') %>% 
  ord_calc(method = 'PCoA') %>% 
  ord_plot(alpha = 0.6, size = 2, color = 'birth_mode') +
  theme_classic(12) +
  coord_fixed(0.7) +
  stat_ellipse(aes(color = birth_mode)) +
  scale_color_brewer(palette = 'Set1')

shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_agg(rank = 'genus') %>% 
  dist_calc(dist = 'bray') %>% 
  dist_permanova(variables = 'birth_mode', n_perms = 99, seed = 123) %>% 
  perm_get()
## 2022-05-19 23:03:56 - Starting PERMANOVA with 99 perms with 1 processes
## 2022-05-19 23:03:56 - Finished PERMANOVA
## Permutation test for adonis under NA model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 99
## 
## vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
##             Df SumOfSqs      R2      F Pr(>F)   
## birth_mode   1   13.790 0.12366 42.898   0.01 **
## Residual   304   97.727 0.87634                 
## Total      305  111.518 1.00000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Use more permutations for a more reliable p.value in your real work (slower)
# Set a random seed number for reproducibility of this stochastic method

You can see from the model output that the p value, Pr(>F) is below 0.05. So there is good statistical evidence that the bacterial gut microbiota composition of c-section delivered infants has a different composition than vaginally delivered infants at 4 days of age.

You should also report that you used Bray-Curtis dissimilarities, calculated on genera. (after keeping only unique taxa with a prevalence of at least 2.5%!)

It’s probably a good idea to decide on a couple of appropriate distance measures up front for these tests, and report both (at least in supplementary material), as the choice of distance measure can affect results and conclusions!

You can also adjust for covariates in PERMANOVA, and often should. Let’s fit a more complex model, adjusting for infant sex, birth weight, and the total number of assigned reads.

shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_agg(rank = 'genus') %>% 
  dist_calc(dist = 'bray') %>% 
  dist_permanova(
    variables = c('birth_mode', 'sex', 'birth_weight', 'number_reads'),
    n_perms = 99, seed = 111
  ) %>% 
  perm_get()
## Dropping samples with missings: 15
## 2022-05-19 23:03:56 - Starting PERMANOVA with 99 perms with 1 processes
## 2022-05-19 23:03:57 - Finished PERMANOVA
## Permutation test for adonis under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 99
## 
## vegan::adonis2(formula = formula, data = metadata, permutations = n_perms, by = by, parallel = parall)
##               Df SumOfSqs      R2       F Pr(>F)   
## birth_mode     1   10.794 0.10163 34.8045   0.01 **
## sex            1    0.280 0.00264  0.9031   0.43   
## birth_weight   1    0.565 0.00532  1.8215   0.06 . 
## number_reads   1    2.873 0.02705  9.2656   0.01 **
## Residual     286   88.696 0.83509                  
## Total        290  106.211 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Use more permutations for a more reliable p.value in your real work (slower)
# Set a random seed number for reproducibility of this stochastic method

Euclidean distances

What about Euclidean distances?

What are those? Slides

  • Show pythagoras \(c = \sqrt{a^2 + b^2}\) and explain generalisation to more dimensions, where each dimension is the counts of one taxon.

Issues

  • Sensitive to sparsity (double-zero problem) –> filter rare taxa
  • Excessive emphasis on high-abundance taxa –> transform features first
  • The PCoA looks weird! most samples bunched in the middle with spindly projections..
shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_agg(rank = 'genus') %>% 
  dist_calc(dist = 'euclidean') %>% 
  ord_calc(method = 'PCoA') %>% 
  ord_plot(alpha = 0.6, size = 2) +
  theme_classic(12) +
  coord_fixed(0.7) + 
  geom_rug(alpha = 0.1)

Abundance transformation

We already did two transformations with tax_transform(): binary (for Binary Jaccard distances) and compositional (for barplots).

Now we need log transformations, and the centered-log-ratio, CLR, transformation.

Log transformation

First let’s look at the abundance again, this time with heatmaps.

# Getting the taxa in abundance order up front
# to keep it consistent across multiple plots
shao4_sorted <- shao4 %>% 
  tax_sort(by = sum, at = 'genus', trans = 'compositional', tree_warn = FALSE)

Each column is a sample (from an infant), and each row is a taxon.

shao4_sorted %>% 
  tax_transform(trans = 'identity', rank = 'genus') %>% 
  comp_heatmap(
    samples = 1:30, taxa = 1:20, grid_lwd = 2, name = 'Counts',
    tax_seriation = 'Identity', sample_seriation = "Identity"
  )

shao4_sorted %>% 
  tax_transform(trans = 'compositional', rank = 'genus') %>% 
  comp_heatmap(
    samples = 1:30, taxa = 1:20, grid_lwd = 2, name = 'Prop.',
    tax_seriation = 'Identity', sample_seriation = "Identity"
  )

Even though we have picked the top 20 most abundant genera, there are still a lot of zeros, We need to deal with the zeros, because log(0) is undefined. The solution is to add a small amount to every value (or just every zero), before applying the log transformation. This small value is often called a pseudo-count.

What value should we use for the pseudo-count?

One option is to just add 1, and another popular option is to add half of the smallest observed real value (from across the whole dataset).

shao4_sorted %>% 
  tax_transform(rank= 'genus', trans = 'log10', zero_replace = 1) %>% 
  comp_heatmap(
    samples = 1:30, taxa = 1:20, grid_lwd = 2, name = 'log10\n(x+1)', 
    tax_seriation = 'Identity', sample_seriation = "Identity"
  )

shao4_sorted %>% 
  tax_agg(rank = 'genus') %>% 
  # tax_transform(trans = 'compositional') %>% # compositional also possible
  tax_transform(trans = 'log10', zero_replace = 'halfmin', chain = TRUE) %>% 
  comp_heatmap(
    samples = 1:30, taxa = 1:20, grid_lwd = 2, name = 'log10\nhalfmin', 
    tax_seriation = 'Identity', sample_seriation = "Identity"
  )

Log-transformation, zero replacement summary (keep it simple and record your approach).

Centered log ratio transformation:

Compositionality - centered-log-ratio transformation

The centered log-ratio (clr) transformation uses the geometric mean of the sample vector as the reference

shao4_sorted %>% 
  tax_agg(rank = 'genus') %>% 
  # tax_transform(trans = 'compositional') %>% # compositional also possible
  tax_transform(trans = 'clr', zero_replace = 'halfmin', chain = TRUE) %>% 
  comp_heatmap(
    samples = 1:30, taxa = 1:20, grid_lwd = 2, name = 'CLR\nhalfmin', 
    colors = heat_palette(sym = TRUE),
    tax_seriation = 'Identity', sample_seriation = "Identity"
  )

Overview of CoDa problem - slides only.

The sequencing data gives us relative abundances, not absolute abundances. The total number of reads sequenced per sample is an arbitrary total.

If one taxon blooms, whilst everything else stays stable, the relative abundance of all other taxa must (appear to) go down.

This leads to two main types of problem:

  • interpretation caveats: see differential abundance section later
  • statistical issues: taxon abundances are not independent, but (weakly?) negatively correlated

This is worse with simpler ecosystems. There is the same problem in theory with RNAseq data, but I suspect it is less bothersome because there are many more competing “species” of RNA transcript than there are bacterial species in even a very complex microbiome.

The centered-log-ratio transformation (along with some other similar ratio transformations) are claimed to help with the statistical issues by transforming the abundances from the simplex to the real space.

Practically, the CLR transformation involves finding the geometric mean of each sample, and then dividing abundance of each taxon in that sample by this geometric mean. Finally you take the natural log of this ratio.

For more details, check out Gloor 2017, and work by Thomas Quinn. TODO: link.

PCA

Principal Components Analysis.

Quite similar to Principal Co-ordinates Analysis.

In fact, PCA produces equivalent results to PCoA with euclidean distances. So let’s perform the CLR-transform first and check PCA and euclidean PCoA are the same.

shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_transform(rank = 'genus', trans = 'clr', zero_replace = 'halfmin') %>% 
  dist_calc(dist = 'euclidean') %>% 
  ord_calc(method = 'PCoA') %>% 
  ord_plot(alpha = 0.6, size = 2, color = 'birth_mode') +
  theme_classic(12) +
  coord_fixed(0.7) 

shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_transform(rank = 'genus', trans = 'clr', zero_replace = 'halfmin') %>% 
  ord_calc(method = 'PCA') %>% 
  ord_plot(alpha = 0.6, size = 2, color = 'birth_mode') +
  theme_classic(12) +
  coord_fixed(0.7)

So why is PCA interesting for us? Because the Principal components are built directly from a (linear) combination of the original features.

That means we know how much each taxon contributes to each PC axis, and we can plot this information (loadings) as arrows, alongside the sample points.

pca <- shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_transform(rank = 'genus', trans = 'clr', zero_replace = 'halfmin') %>% 
  ord_calc(method = 'PCA') %>% 
  ord_plot(
    alpha = 0.6, size = 2, color = 'birth_mode',
    plot_taxa = 1:6, tax_vec_length = 0.275,
    tax_lab_style = tax_lab_style(
      type = 'text', max_angle = 90, aspect_ratio = 0.7, 
      size = 3, fontface = 'bold'
    ),
  ) +
  theme_classic(12) +
  coord_fixed(0.7, clip = 'off')
pca

How to interpret the taxa loading vectors? Cautiously.

The relative length and direction of an arrow indicates how much that taxon contributes to the variation on each visible PC axis. e.g. Variation in Enterococcus contributes quite a lot to variation along the PC2 axis.

This allows you to infer that samples positioned at the bottom of the plot will tend to have higher relative abundance of Enterococcus than samples at the top of the plot.

Interestingly, samples on the right of the plot (which tend to be vaginally-delivered infants) seem to have relatively more Bifidobacterium, Bacteroides and Escherichia, whilst the C-section born infants have relatively more Veillonella.

There are caveats and nuance to the interpretation of these plots, which are called PCA bi-plots, and you can read more about here: TODO link to appropriate gusta me ecology page.

(Side note, Phocaeicola were considered part of Bacteroides until this year!)

Iris plot

You might have already noticed this pattern, when exploring and making barplots interactively with ord_explore earlier. We can make another kind of barplot now, using the PCA information to order our samples in a circular layout.

iris <- shao4 %>% 
  tax_filter(min_prevalence = 2.5/100, verbose = FALSE) %>% 
  tax_transform(rank = 'genus', trans = 'clr', zero_replace = 'halfmin') %>% 
  ord_calc(method = 'PCA') %>% 
  ord_plot_iris(
    tax_level = 'genus', n_taxa = 12, other = "Other", 
    anno_colour = 'birth_mode', anno_colour_style = list(alpha = 0.6, size = 0.6, show.legend = FALSE)
  )
iris

patchwork::wrap_plots(pca, iris, nrow = 1, guides = 'collect')

Taxon stats

Plots and statistics with individual taxa - (simple) “differential abundance testing”

Part D:

Go to DADA2 tutorial. Adapt the 2020 tutorial.

Do I have time to include this… maybe just keep it as an extension/ homework..

End workshop with recap and notes/pointers on current trends and topics not covered?

Session info

Records your package versions etc. Useful for debugging / reproducing analysis.

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.0 (2022-04-22)
##  os       macOS Big Sur/Monterey 10.16
##  system   x86_64, darwin17.0
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Amsterdam
##  date     2022-05-19
##  pandoc   2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  ! package          * version  date (UTC) lib source
##  P ade4               1.7-19   2022-04-19 [?] CRAN (R 4.2.0)
##  P ape                5.6-2    2022-03-02 [?] CRAN (R 4.2.0)
##    assertthat         0.2.1    2019-03-21 [1] CRAN (R 4.2.0)
##    Biobase            2.56.0   2022-04-26 [2] Bioconductor
##    BiocGenerics       0.42.0   2022-04-26 [2] Bioconductor
##  P biomformat         1.24.0   2022-04-26 [?] RSPM (R 4.2.0)
##    Biostrings         2.64.0   2022-04-26 [2] Bioconductor
##    bitops             1.0-7    2021-04-24 [2] CRAN (R 4.2.0)
##    brio               1.1.3    2021-11-30 [1] CRAN (R 4.2.0)
##  P bslib              0.3.1    2021-10-06 [?] CRAN (R 4.2.0)
##    cachem             1.0.6    2021-08-19 [1] CRAN (R 4.2.0)
##    callr              3.7.0    2021-04-20 [1] CRAN (R 4.2.0)
##    circlize           0.4.15   2022-05-10 [1] CRAN (R 4.2.0)
##    cli                3.3.0    2022-04-25 [2] CRAN (R 4.2.0)
##    clue               0.3-60   2021-10-11 [1] CRAN (R 4.2.0)
##    cluster            2.1.3    2022-03-28 [2] CRAN (R 4.2.0)
##    codetools          0.2-18   2020-11-04 [2] CRAN (R 4.2.0)
##    colorspace         2.0-3    2022-02-21 [2] CRAN (R 4.2.0)
##    ComplexHeatmap     2.12.0   2022-04-26 [1] Bioconductor
##  P corncob            0.2.0    2021-03-11 [?] CRAN (R 4.2.0)
##    crayon             1.5.1    2022-03-26 [2] CRAN (R 4.2.0)
##  P data.table         1.14.2   2021-09-27 [?] CRAN (R 4.2.0)
##  P DBI                1.1.2    2021-12-20 [?] CRAN (R 4.2.0)
##    desc               1.4.1    2022-03-06 [1] CRAN (R 4.2.0)
##    devtools           2.4.3    2021-11-30 [1] CRAN (R 4.2.0)
##    digest             0.6.29   2021-12-01 [2] CRAN (R 4.2.0)
##    doParallel         1.0.17   2022-02-07 [1] CRAN (R 4.2.0)
##  P dplyr            * 1.0.9    2022-04-28 [?] CRAN (R 4.2.0)
##    ellipsis           0.3.2    2021-04-29 [2] CRAN (R 4.2.0)
##  P evaluate           0.15     2022-02-18 [?] CRAN (R 4.2.0)
##    fansi              1.0.3    2022-03-24 [2] CRAN (R 4.2.0)
##    farver             2.1.0    2021-02-28 [2] CRAN (R 4.2.0)
##  P fastmap            1.1.0    2021-01-25 [?] CRAN (R 4.2.0)
##  P forcats            0.5.1    2021-01-27 [?] CRAN (R 4.2.0)
##  P foreach            1.5.2    2022-02-02 [?] CRAN (R 4.2.0)
##  P fs                 1.5.2    2021-12-08 [?] CRAN (R 4.2.0)
##  P generics           0.1.2    2022-01-31 [?] CRAN (R 4.2.0)
##    GenomeInfoDb       1.32.1   2022-04-28 [2] Bioconductor
##    GenomeInfoDbData   1.2.8    2022-05-12 [2] Bioconductor
##    GetoptLong         1.0.5    2020-12-15 [1] CRAN (R 4.2.0)
##    ggplot2          * 3.3.6    2022-05-03 [2] CRAN (R 4.2.0)
##    ggrepel            0.9.1    2021-01-15 [1] CRAN (R 4.2.0)
##    GlobalOptions      0.1.2    2020-06-10 [1] CRAN (R 4.2.0)
##    glue               1.6.2    2022-02-24 [2] CRAN (R 4.2.0)
##    gtable             0.3.0    2019-03-25 [2] CRAN (R 4.2.0)
##  P here               1.0.1    2020-12-13 [?] CRAN (R 4.2.0)
##  P highr              0.9      2021-04-16 [?] CRAN (R 4.2.0)
##  P htmltools          0.5.2    2021-08-25 [?] CRAN (R 4.2.0)
##    httpuv             1.6.5    2022-01-05 [1] CRAN (R 4.2.0)
##  P igraph             1.3.1    2022-04-20 [?] CRAN (R 4.2.0)
##    IRanges            2.30.0   2022-04-26 [2] Bioconductor
##  P iterators          1.0.14   2022-02-05 [?] CRAN (R 4.2.0)
##  P jquerylib          0.1.4    2021-04-26 [?] CRAN (R 4.2.0)
##  P jsonlite           1.8.0    2022-02-22 [?] CRAN (R 4.2.0)
##  P knitr              1.39     2022-04-26 [?] CRAN (R 4.2.0)
##    labeling           0.4.2    2020-10-20 [2] CRAN (R 4.2.0)
##    later              1.3.0    2021-08-18 [1] CRAN (R 4.2.0)
##    lattice            0.20-45  2021-09-22 [2] CRAN (R 4.2.0)
##    lifecycle          1.0.1    2021-09-24 [2] CRAN (R 4.2.0)
##    magrittr           2.0.3    2022-03-30 [2] CRAN (R 4.2.0)
##    MASS               7.3-56   2022-03-23 [2] CRAN (R 4.2.0)
##    Matrix             1.4-1    2022-03-23 [2] CRAN (R 4.2.0)
##    matrixStats        0.62.0   2022-04-19 [2] CRAN (R 4.2.0)
##    memoise            2.0.1    2021-11-26 [1] CRAN (R 4.2.0)
##    mgcv               1.8-40   2022-03-29 [2] CRAN (R 4.2.0)
##  P microbiome         1.18.0   2022-04-26 [?] RSPM (R 4.2.0)
##    microViz         * 0.9.1    2022-05-16 [1] Github (david-barnett/microViz@07f1fdb)
##  P mime               0.12     2021-09-28 [?] CRAN (R 4.2.0)
##  P multtest           2.52.0   2022-04-26 [?] RSPM (R 4.2.0)
##    munsell            0.5.0    2018-06-12 [2] CRAN (R 4.2.0)
##    nlme               3.1-157  2022-03-25 [2] CRAN (R 4.2.0)
##    patchwork          1.1.1    2020-12-17 [1] CRAN (R 4.2.0)
##  P permute            0.9-7    2022-01-27 [?] CRAN (R 4.2.0)
##  P phyloseq         * 1.40.0   2022-04-26 [?] RSPM (R 4.2.0)
##    pillar             1.7.0    2022-02-01 [2] CRAN (R 4.2.0)
##    pkgbuild           1.3.1    2021-12-20 [1] CRAN (R 4.2.0)
##    pkgconfig          2.0.3    2019-09-22 [2] CRAN (R 4.2.0)
##    pkgload            1.2.4    2021-11-30 [1] CRAN (R 4.2.0)
##    plyr               1.8.7    2022-03-24 [2] CRAN (R 4.2.0)
##    png                0.1-7    2013-12-03 [2] CRAN (R 4.2.0)
##    prettyunits        1.1.1    2020-01-24 [1] CRAN (R 4.2.0)
##    processx           3.5.3    2022-03-25 [1] CRAN (R 4.2.0)
##    promises           1.2.0.1  2021-02-11 [1] CRAN (R 4.2.0)
##    ps                 1.7.0    2022-04-23 [1] CRAN (R 4.2.0)
##  P purrr              0.3.4    2020-04-17 [?] CRAN (R 4.2.0)
##    R6                 2.5.1    2021-08-19 [2] CRAN (R 4.2.0)
##    RColorBrewer       1.1-3    2022-04-03 [2] CRAN (R 4.2.0)
##    Rcpp               1.0.8.3  2022-03-17 [2] CRAN (R 4.2.0)
##    RCurl              1.98-1.6 2022-02-08 [2] CRAN (R 4.2.0)
##    registry           0.5-1    2019-03-05 [1] CRAN (R 4.2.0)
##  P remotes            2.4.2    2021-11-30 [?] CRAN (R 4.2.0)
##    reshape2           1.4.4    2020-04-09 [2] CRAN (R 4.2.0)
##  P rhdf5              2.40.0   2022-04-26 [?] RSPM (R 4.2.0)
##  P rhdf5filters       1.8.0    2022-04-26 [?] RSPM (R 4.2.0)
##  P Rhdf5lib           1.18.2   2022-05-15 [?] Bioconductor
##    rjson              0.2.21   2022-01-09 [1] CRAN (R 4.2.0)
##    rlang              1.0.2    2022-03-04 [2] CRAN (R 4.2.0)
##  P rmarkdown          2.14     2022-04-25 [?] CRAN (R 4.2.0)
##    rprojroot          2.0.3    2022-04-02 [1] CRAN (R 4.2.0)
##    rstudioapi         0.13     2020-11-12 [1] CRAN (R 4.2.0)
##  P Rtsne              0.16     2022-04-17 [?] CRAN (R 4.2.0)
##    S4Vectors          0.34.0   2022-04-26 [2] Bioconductor
##  P sass               0.4.1    2022-03-23 [?] CRAN (R 4.2.0)
##    scales             1.2.0    2022-04-13 [2] CRAN (R 4.2.0)
##    seriation        * 1.3.5    2022-03-28 [1] CRAN (R 4.2.0)
##    sessioninfo        1.2.2    2021-12-06 [1] CRAN (R 4.2.0)
##    shape              1.4.6    2021-05-19 [1] CRAN (R 4.2.0)
##    shiny            * 1.7.1    2021-10-02 [1] CRAN (R 4.2.0)
##    stringi            1.7.6    2021-11-29 [2] CRAN (R 4.2.0)
##    stringr            1.4.0    2019-02-10 [2] CRAN (R 4.2.0)
##    survival           3.3-1    2022-03-03 [2] CRAN (R 4.2.0)
##    testthat           3.1.4    2022-04-26 [1] CRAN (R 4.2.0)
##    tibble             3.1.7    2022-05-03 [2] CRAN (R 4.2.0)
##  P tidyr              1.2.0    2022-02-01 [?] CRAN (R 4.2.0)
##  P tidyselect         1.1.2    2022-02-21 [?] CRAN (R 4.2.0)
##    TSP                1.2-0    2022-02-21 [1] CRAN (R 4.2.0)
##    usethis            2.1.5    2021-12-09 [1] CRAN (R 4.2.0)
##    utf8               1.2.2    2021-07-24 [2] CRAN (R 4.2.0)
##    vctrs              0.4.1    2022-04-13 [2] CRAN (R 4.2.0)
##  P vegan              2.6-2    2022-04-17 [?] CRAN (R 4.2.0)
##    withr              2.5.0    2022-03-03 [2] CRAN (R 4.2.0)
##  P xfun               0.31     2022-05-10 [?] CRAN (R 4.2.0)
##    xtable             1.8-4    2019-04-21 [1] CRAN (R 4.2.0)
##    XVector            0.36.0   2022-04-26 [2] Bioconductor
##  P yaml               2.3.5    2022-02-21 [?] CRAN (R 4.2.0)
##    zlibbioc           1.42.0   2022-04-26 [2] Bioconductor
## 
##  [1] /Users/david/Documents/my_projects/evomics/evomics-material/renv/library/R-4.2/x86_64-apple-darwin17.0
##  [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
## 
##  P ── Loaded and on-disk path mismatch.
## 
## ──────────────────────────────────────────────────────────────────────────────